CHAPTER 10

Working with Strings

Within the .NET Base Class Library (BCL), the System.String type is the model citizen. It offers an ideal example of how to create an immutable reference type that semantically acts like a value type.

String Overview

Instances of String are immutable in the sense that once you create them, you cannot change them. Although it may seem inefficient at first, this approach actually does make code more efficient. When you copy String instances liberally within the application, you create a new instance that points to the same raw string data as the source instance. Even if you call the IClonable.Clone method on a string, you get an instance that points to the same string data as the source. This is entirely safe because the String public interface offers no way to modify the actual String data. If you require a string that is a deep copy of the original string, you may call the Copy method to do so.

Note Those of you who are familiar with common design patterns and idioms may recognize this usage pattern as the handle/body or envelope/letter idiom.

In many environments, such as C++ and C, the string is not a built-in type at all, but rather a more primitive, raw construct, such as a pointer to the first character in an array of characters. Typically, string-manipulation routines are not part of the language but rather a part of a library used with the language. Although that is mostly true with Visual Basic (VB), the lines are somewhat blurred by the .NET runtime. The designers of the Common Language Infrastructure (CLI) specification could have chosen to represent all strings as simple arrays of System.Char types, but instead they chose to annex System.String into the collection of built-in types. In fact, System.String is an oddball in the built-in type collection, since it is a reference type, and most of the built-in types are value types. However, this difference is blurred by the fact that the String type behaves with value semantics.

You may already know that the System.String type represents a Unicode character string, and System.Char represents a 16-bit Unicode character. Of course, this makes portability and localization to other operating systems—especially systems with large character sets—easy. However, sometimes you might need to interface with external systems using encodings other than Unicode character strings. For times like these, you can employ the System.Text.Encoding class to convert to and from various encodings, including ASCII, UTF-7, UTF-8, and UTF-32. Incidentally, the Unicode format used internally by the runtime is UTF-16.

String Literals

When you declare a string in your code, the compiler creates a System.String object for you that it then places into an internal table in the module called the intern pool. The idea is that each time you declare a new string literal within your code, the compiler first checks to see if you’ve declared the same string elsewhere, and if you have, then the code simply references the one already interned. Let’s take a look at an example of ways to declare a string literal:

Imports System

Public Class EntryPoint
    Shared Sub Main(ByVal args As String())
        Dim lit1 As String = "c:\windows\system32"
        Dim lit2 As String = "c:\windows\system32"
        Dim lit3 As String = vbCrLf & "Jack and Jill" & vbCrLf & _
            "Went up the hill..." & vbCrLf

        Console.WriteLine(lit3)

        Console.WriteLine("Object.RefEq(lit1, lit2): {0}", _
            Object.ReferenceEquals(lit1, lit2))

        If args.Length > 0 Then
            Console.WriteLine("Parameter given: {0}", args(0))

            Dim strNew As String = String.Intern(args(0))

            Console.WriteLine("Object.RefEq(lit1, strNew): {0}", _
                Object.ReferenceEquals(lit1, strNew))
        End If
    End Sub
End Class

Here’s the output from the previous example:
Jack and Jill
Went up the hill...

Object.RefEq(lit1, lit2): True
Parameter given: This is an IP address: 123.124.125.126
Object.RefEq(lit1, strNew): False

Note: To run this example, you must create a command-line argument in your project properties. On the Debug tab, add “This is an IP address: 123.124.125.126” in the Start Options area.